1
Limitations of 1-Step TD: The Case for n-Step Methods
AI029 Lesson 7
00:00

Imagine a robot navigating a vast, dark 20-segment hallway to find a charging station at the end. With 1-step TD, this robot is remarkably "near-sighted." Even after a successful run, it only updates the value of the very last tile before the charger. It would take twenty successful trips for that "scent" of reward to reach the start of the hallway. This is the Efficiency Gap: information propagates too slowly for large state spaces.

The Spectrum of Bootstrapping The n-step Continuum λ = 0 (1-step TD) λ = 1 (Monte Carlo) Complex Backup

Bridging the Gap: The n-step Method

By using an n-step return, we perform a "Complex Backup" that looks multiple steps into the future before bootstrapping. This creates a continuum between two extremes:

  • High Bias, Low Variance ($\lambda = 0$): 1-step TD reduces to immediate estimation. It is stable but crawls toward the truth.
  • Zero Bias, High Variance ($\lambda = 1$): TD(1) behaves like a Monte Carlo method for an undiscounted episodic task, looking at the entire actual outcome.

Empirical Evidence (Figure 7.2)

When analyzing a 19-state random walk, the data is clear: the "sweet spot" is in the middle. Intermediate values of $n$ consistently achieve the lowest RMS error. However, there is a trade-off: as $n$ increases, the method becomes more sensitive to the step-size $\alpha$, requiring a more cautious learning rate to avoid divergence.